Initial Replication Study

Figure 4

First, we get the data. We downloaded it fresh from Google Trends, looking at all 3 together in comparison.

immigration_data <- read_csv('from_google_trends/google_trends_collective.csv', skip=3, col_names=c('month', 'welfare_trend', 'crime_trend', 'report_trend')) |>
    mutate(month = parse_date(month, "%Y-%m")) |>
    mutate(president=as.factor(ifelse(year(month) <= 2008, "Bush", ifelse(year(month) <= 2016, "Obama", "Trump"))))
## Rows: 192 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (3): welfare_trend, crime_trend, report_trend
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now, we recreate the plot.

immigration_data |>
    pivot_longer(names_to = "trend_type", values_to = "trend_value", cols = ends_with("_trend")) |>
    mutate(trend_type=factor(trend_type, labels=c("report_trend", "crime_trend", "welfare_trend"))) |>
    ggplot(aes(x=month, y=trend_value, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    facet_wrap(~ trend_type, dir="v")

## Table 3

Now we do table 3.

table_3_data <- immigration_data |>
    mutate(is_bush = president == "Bush", is_trump = president == "Trump")

models <- list(
    crime = lm(crime_trend ~ month + is_bush + is_trump, data = table_3_data),
    welfare = lm(welfare_trend ~ month + is_bush + is_trump, data = table_3_data),
    report = lm(report_trend ~ month + is_bush + is_trump, data = table_3_data)
)

msummary(models, output='markdown')
tinytable_koq8u9kyuy175s6nkyaf
crime welfare report
(Intercept) -31.109 -17.690 60.338
(13.241) (10.542) (16.691)
month 0.003 0.002 -0.001
(0.001) (0.001) (0.001)
is_bushTRUE 2.836 1.673 -0.748
(2.388) (1.901) (3.010)
is_trumpTRUE 7.348 3.798 15.219
(2.294) (1.826) (2.892)
Num.Obs. 192 192 192
R2 0.351 0.265 0.197
R2 Adj. 0.341 0.253 0.184
AIC 1345.8 1258.3 1434.7
BIC 1362.1 1274.6 1451.0
Log.Lik. -667.905 -624.134 -712.370
RMSE 7.84 6.24 9.89

We are getting different values for the model. Not only are the constants off, but they have welfare increasing by date, while we have it decreasing. They also have big positive changes from Bush. We have small positive changes from Crime and Welfare, and by Reporting, where they have a small positive change, we have a small negative change. And they find bigger changes for Trump than we do.

But, this still supports their point from this table, which was that Trump’s election led to a large positive increase in all these searches, which we do find.

table_3_data |>
    add_predictions(models[["report"]]) |>
    ggplot(aes(x=month, y=report_trend, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    geom_line(aes(y=pred), linetype="dashed")

table_3_data |>
    add_predictions(models[["crime"]]) |>
    ggplot(aes(x=month, y=crime_trend, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    geom_line(aes(y=pred), linetype="dashed")

table_3_data |>
    add_predictions(models[["welfare"]]) |>
    ggplot(aes(x=month, y=welfare_trend, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    geom_line(aes(y=pred), linetype="dashed")

This looks very different from out original plot, but that makes sense. Before, geom_smooth was using * to have different slopes for the different presidencies. Since here, we only have a constant based on who was president, we have one slope for everything.

Figures 2 and 3

Now, we get the topic model data.

load("TopicModel.RData")
document_topics <- make.dt(immigrFit, meta = out$meta)
topic_terms <- t(exp(immigrFit$beta$logbeta[[1]]))
rownames(topic_terms) <- out$vocab
colnames(topic_terms) <- sprintf("Topic%d", 1:ncol(topic_terms))

document_topics <- document_topics |>
    mutate(date = ymd(date))

First, we make figure 2. It is about immigration coverage in general, not of a specific category, so I think all the documents there count.

campaign_start <- document_topics |>
    filter(time == "pre-election") |>
    arrange(date) |>
    slice_tail(n=1) |>
    pull(date)

campaign_end <- document_topics |>
    filter(time == "post-election") |>
    arrange(date) |>
    slice_head(n=1) |>
    pull(date)

document_topics |>
    mutate(month = floor_date(date, "month")) |>
    group_by(month, channel, time) |>
    summarize(num_segments = n()) |>
    ggplot(aes(x=month, y=num_segments, color=channel)) +
    geom_point() +
    geom_smooth(aes(group = interaction(channel, time)), se=F) +
    geom_vline(aes(xintercept=campaign_start), linetype="dashed") +
    geom_vline(aes(xintercept=campaign_end), linetype="dashed") +
    scale_color_manual(values=c("magenta", "red", "blue"))
## `summarise()` has grouped output by 'month', 'channel'. You can override using
## the `.groups` argument.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

We have essentially the same results as they do, but with a less severe jump after the campaign start and a bigger drop before the inauguration. But their overall findings still show.

document_topics |>
    mutate(month = floor_date(date, "month")) |>
    group_by(month, channel, time) |>
    summarize(proportion_1 = sum(Topic1), proportion_3 = sum(Topic3), crime_coverage = proportion_1 + proportion_3) |>
    ggplot(aes(x=month, y=crime_coverage, color=channel)) +
    geom_point() +
    geom_smooth(aes(group = interaction(channel, time)), se=F) +
    geom_vline(aes(xintercept=campaign_start), linetype="dashed") +
    geom_vline(aes(xintercept=campaign_end), linetype="dashed") +
    scale_color_manual(values=c("magenta", "red", "blue"))
## `summarise()` has grouped output by 'month', 'channel'. You can override using
## the `.groups` argument.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

document_topics |>
    mutate(month = floor_date(date, "month")) |>
    group_by(month, channel, time) |>
    summarize(welfare_coverage = sum(Topic13)) |>
    ggplot(aes(x=month, y=welfare_coverage, color=channel)) +
    geom_point() +
    geom_smooth(aes(group = interaction(channel, time)), se=F) +
    geom_vline(aes(xintercept=campaign_start), linetype="dashed") +
    geom_vline(aes(xintercept=campaign_end), linetype="dashed") +
    scale_color_manual(values=c("magenta", "red", "blue"))
## `summarise()` has grouped output by 'month', 'channel'. You can override using
## the `.groups` argument.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

This time, we get identical results. I suspect that in the original, we only differ because of how we deal with the months bifrucated by the dotted lines. We were consistent, always treating them as multiple points, causing the lines to cross the dotted lines. They only seem to have done that for the second one for some reason.

Table 4 Monthly

Now we reproduce table 4. This is before we get the daily data, so we do it on a monthly level.

topic_table <- document_topics |>
    mutate(month = floor_date(date, "month")) |>
    group_by(month, time) |>
    summarize(proportion_1 = sum(Topic1), proportion_3 = sum(Topic3), crime_coverage = proportion_1 + proportion_3, welfare_coverage = sum(Topic13), num_segments = n()) |>
    mutate(trump_admin = time == "post-election", month_of_year=month(month, label=T)) |>
    inner_join(immigration_data, by="month")
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
lm(report_trend ~ num_segments + crime_coverage + welfare_coverage + trump_admin + month + month_of_year, topic_table) |>
    msummary(output='markdown')
tinytable_sqxokauo2s89nujviel7
(1)
(Intercept) 83.418
(54.312)
num_segments 0.000
(0.003)
crime_coverage 0.084
(0.046)
welfare_coverage 0.291
(0.145)
trump_adminTRUE 15.082
(3.821)
month -0.003
(0.003)
month_of_year.L -11.362
(3.452)
month_of_year.Q 9.569
(3.306)
month_of_year.C 0.599
(3.358)
month_of_year^4 0.059
(3.395)
month_of_year^5 0.713
(3.442)
month_of_year^6 -2.255
(3.461)
month_of_year^7 3.271
(3.442)
month_of_year^8 -0.610
(3.400)
month_of_year^9 -1.815
(3.345)
month_of_year^10 -3.696
(3.262)
month_of_year^11 0.880
(3.241)
Num.Obs. 71
R2 0.715
R2 Adj. 0.630
AIC 512.7
BIC 553.4
Log.Lik. -238.356
RMSE 6.95

Obviously, this is different from their result, because it is on a monthly level. But we found significantly lower effects of number of crime and welfare coverage, and no effect from the number of segments. These things shouldn’t change, but perhaps the effect only exists in the short-term (but if so, that changes the paper’s conclusions). Though we did still have a high modifier for Trump’s presidency, presumably because that was a long-term effect.

Figure 4 and Table 3 Using Replication Files

We want to recreate figure 4 and table 3 using the data they used to see if we still get the same results.

report_data <- read_csv('from_replication_files/google_trends_report.csv') |>
    rename(report_trend = search)
## Rows: 190 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (2): year, search
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crime_data <- read_csv('from_replication_files/google_trends_crime.csv') |>
    rename(crime_trend = search)
## Rows: 190 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (2): year, search
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
welfare_data <- read_csv('from_replication_files/google_trends_welfare.csv') |>
    rename(welfare_trend = search)
## Rows: 190 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (2): year, search
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trend_data_given <- report_data |>
    inner_join(crime_data, by=c("year", "month")) |>
    inner_join(welfare_data, by=c("year", "month")) |>
    mutate(month = paste(year, month), month = parse_date(month, "%Y %m")) |>
    mutate(president=as.factor(ifelse(year(month) <= 2008, "Bush", ifelse(year(month) <= 2016, "Obama", "Trump"))))

trend_data_given |>
    pivot_longer(names_to = "trend_type", values_to = "trend_value", cols = ends_with("_trend")) |>
    mutate(trend_type=factor(trend_type, labels=c("report_trend", "crime_trend", "welfare_trend"))) |>
    ggplot(aes(x=month, y=trend_value, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    facet_wrap(~ trend_type, dir="v")

table_3_data_given <- trend_data_given |>
    mutate(is_bush = president == "Bush", is_trump = president == "Trump")

list(
    crime = lm(crime_trend ~ month + is_bush + is_trump, data = table_3_data_given),
    welfare = lm(welfare_trend ~ month + is_bush + is_trump, data = table_3_data_given),
    report = lm(report_trend ~ month + is_bush + is_trump, data = table_3_data_given)
) |> msummary(output='markdown')
tinytable_l7ofgz5jbnn2lkwhd2mr
crime welfare report
(Intercept) -4.163 28.585 72.574
(23.241) (20.710) (16.089)
month 0.001 0.000 -0.002
(0.001) (0.001) (0.001)
is_bushTRUE 8.280 7.819 0.785
(4.187) (3.731) (2.899)
is_trumpTRUE 19.903 18.560 19.716
(4.027) (3.588) (2.788)
Num.Obs. 190 190 190
R2 0.262 0.237 0.286
R2 Adj. 0.250 0.225 0.275
AIC 1544.4 1500.6 1404.7
BIC 1560.7 1516.8 1420.9
Log.Lik. -767.216 -745.305 -697.339
RMSE 13.72 12.23 9.50

With this, we get the same data as he does. We still don’t get quite the same regression, but we get closer. Interestingly, a lot of times, they have non-zero trends when we have zero trends.

But it’s also true that he downloaded this in 3 separate files while we did it in a single file. Since Google measures things relative to each other when you download them together, it’s possible that this changes things.

Table 4 Using Replication Daily Data

Now we will replicate Table 4 using the given day values.

given_daily_data <- read_csv('from_replication_files/gt_report_daily.csv')
## New names:
## Rows: 5751 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," dbl
## (3): ...1, search, search_adj date (1): date
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
topic_table_daily <- document_topics |>
    group_by(date, time) |>
    summarize(proportion_1 = sum(Topic1), proportion_3 = sum(Topic3), crime_coverage = proportion_1 + proportion_3, welfare_coverage = sum(Topic13), num_segments = n()) |>
    mutate(trump_admin = time == "post-election", month_of_year=month(date, label=T), day_of_week=wday(date, label=T)) |>
    inner_join(given_daily_data, by="date")
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
list(
    search_model=lm(search ~ num_segments + crime_coverage + welfare_coverage + trump_admin + date + day_of_week + month_of_year, topic_table_daily),
    search_adj_model=lm(search_adj ~ num_segments + crime_coverage + welfare_coverage + trump_admin + date + day_of_week + month_of_year, topic_table_daily)
)|>
    msummary(output='markdown')
tinytable_ae6mh6bzz99kts2qodjf
search_model search_adj_model
(Intercept) 97.103 112.333
(23.412) (23.691)
num_segments 0.049 0.059
(0.018) (0.018)
crime_coverage 0.006 0.596
(0.263) (0.266)
welfare_coverage 1.319 1.895
(0.677) (0.685)
trump_adminTRUE 7.750 19.614
(1.672) (1.692)
date -0.003 -0.005
(0.001) (0.001)
day_of_week.L 1.139 1.356
(1.090) (1.103)
day_of_week.Q -7.004 -6.748
(1.094) (1.107)
day_of_week.C -0.031 -0.627
(1.088) (1.101)
day_of_week^4 -0.635 -1.020
(1.087) (1.100)
day_of_week^5 2.848 2.523
(1.087) (1.100)
day_of_week^6 -1.804 -1.732
(1.085) (1.098)
month_of_year.L 4.413 -9.821
(1.488) (1.506)
month_of_year.Q 1.605 8.287
(1.435) (1.452)
month_of_year.C 1.615 -2.743
(1.451) (1.468)
month_of_year^4 4.555 2.751
(1.449) (1.467)
month_of_year^5 -0.484 2.042
(1.447) (1.464)
month_of_year^6 -4.023 -4.824
(1.456) (1.474)
month_of_year^7 3.080 1.403
(1.438) (1.455)
month_of_year^8 -1.738 -0.317
(1.432) (1.449)
month_of_year^9 1.755 -1.039
(1.429) (1.446)
month_of_year^10 -3.194 -0.823
(1.404) (1.420)
month_of_year^11 0.472 1.480
(1.392) (1.409)
Num.Obs. 2030 2030
R2 0.089 0.266
R2 Adj. 0.079 0.258
AIC 17633.7 17681.8
BIC 17768.5 17816.6
Log.Lik. -8792.849 -8816.907
RMSE 18.40 18.62

From here, we see that while neither of them are exactly identical, the one using the adjusted values is much closer to what they got. But the non-adjusted version finds a much weaker effect, with crime coverage having an ambiguous effect.

Looking at the data, it seems that the problem is that they can only look at one time period (say, month) at a time, and each month is only scaled with other searches in that month. But, I don’t understand how their normalization works.

Figure 4 and Table 3 Using Individually Downloaded Data

We noticed that they got each prompt separately. Since Google Trends weights everything you are comparing to the max of the other, if we want to replicate their results, then we need to get them separately like what the researchers seem to have done. We downloaded the immigrant report, crime, and welfare trends separately.

report_data <- read_csv('from_google_trends/report_trend_individual.csv', skip=3, col_names=c('month', 'report_trend'))
## Rows: 192 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (1): report_trend
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crime_data <- read_csv('from_google_trends/crime_trend_individual.csv', skip=3, col_names=c('month', 'crime_trend'))
## Rows: 192 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (1): crime_trend
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
welfare_data <- read_csv('from_google_trends/welfare_trend_individual.csv', skip=3, col_names=c('month', 'welfare_trend'))
## Rows: 192 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): month
## dbl (1): welfare_trend
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trend_data_individual <- report_data |>
    inner_join(crime_data, by=c("month")) |>
    inner_join(welfare_data, by=c("month")) |>
    mutate(month = parse_date(month, "%Y-%m")) |>
    mutate(president=as.factor(ifelse(year(month) <= 2008, "Bush", ifelse(year(month) <= 2016, "Obama", "Trump"))))

trend_data_individual |>
    pivot_longer(names_to = "trend_type", values_to = "trend_value", cols = ends_with("_trend")) |>
    mutate(trend_type=factor(trend_type, labels=c("report_trend", "crime_trend", "welfare_trend"))) |>
    ggplot(aes(x=month, y=trend_value, color=president)) +
    geom_point(alpha = 0.3) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    facet_wrap(~ trend_type, dir="v")

table_3_data_individual <- trend_data_individual |>
    mutate(is_bush = president == "Bush", is_trump = president == "Trump")

list(
    crime = lm(crime_trend ~ month + is_bush + is_trump, data = table_3_data_individual),
    welfare = lm(welfare_trend ~ month + is_bush + is_trump, data = table_3_data_individual),
    report = lm(report_trend ~ month + is_bush + is_trump, data = table_3_data_individual)
) |> msummary(output='markdown')
tinytable_aoknccietzez8udx6t4f
crime welfare report
(Intercept) -57.817 -20.711 68.858
(23.105) (27.392) (17.014)
month 0.005 0.003 -0.002
(0.001) (0.002) (0.001)
is_bushTRUE 6.951 4.410 -1.618
(4.166) (4.939) (3.068)
is_trumpTRUE 12.422 12.235 16.077
(4.003) (4.746) (2.948)
Num.Obs. 192 192 192
R2 0.339 0.190 0.190
R2 Adj. 0.329 0.177 0.178
AIC 1559.6 1625.0 1442.1
BIC 1575.9 1641.2 1458.4
Log.Lik. -774.798 -807.478 -716.049
RMSE 13.69 16.23 10.08

But the paper still has welfare searches going up in Trump’s presidency, while we have it going down slightly. And in the table, they have welfare going down over time, while we have it going up. So, this discrepency doesn’t explain anything.

Extension

The paper says: “We also expect that when the Trump administration receives media coverage for anti-immigrant rhetoric or policies, we will see an uptick in reporting searches. However, we do not expect a discontinuity in reporting searches immediately after the 2016 election, nor do we expect media coverage of candidate Trump’s anti-immigrant rhetoric to generate more reporting searches. Neither candidate Trump nor president-elect Trump had the power to change immigration policy before his inauguration, so his anti-immigrant positions should not increase interest in reporting” (5-6). Smriti and I are skeptical of this claim, which the paper doesn’t seem to actually prove, so we are going to investigate it and see if we find any discontinuities, and whether coverage of Candidate Trump also increases the amount of reporting.

Candidate Trump Boosting Reports

So, we will make a regression, where we have trump and candidacy stage as * along with month see if that boosts it.

trump_extension_table <- document_topics |>
    mutate(month = floor_date(date, "month")) |>
    group_by(month, time, trump) |>
    summarize(proportion_1 = sum(Topic1), proportion_3 = sum(Topic3), crime_coverage = proportion_1 + proportion_3, welfare_coverage = sum(Topic13), num_segments = n()) |>
    mutate(month_of_year=month(month, label=T)) |>
    inner_join(trend_data_individual, by="month")
## `summarise()` has grouped output by 'month', 'time'. You can override using the
## `.groups` argument.
list(
    lm(report_trend ~ num_segments + crime_coverage + welfare_coverage + time * trump + month + month_of_year, trump_extension_table),
    lm(report_trend ~ num_segments + crime_coverage + welfare_coverage + time * trump + month, trump_extension_table)
     ) |>
    msummary(output='markdown')
tinytable_hibngc4ooy8lc5oit9jt
(1) (2)
(Intercept) 38.257 77.100
(54.189) (60.811)
num_segments 0.003 0.008
(0.004) (0.004)
crime_coverage 0.134 0.077
(0.061) (0.065)
welfare_coverage 0.232 0.159
(0.181) (0.200)
timeelection 0.493 0.897
(3.581) (4.027)
timepost-election 12.474 15.023
(5.191) (5.857)
trump 7.010 9.438
(3.952) (4.352)
month 0.000 -0.003
(0.003) (0.004)
month_of_year.L -11.771
(2.576)
month_of_year.Q 9.783
(2.486)
month_of_year.C 0.025
(2.549)
month_of_year^4 1.969
(2.533)
month_of_year^5 0.136
(2.567)
month_of_year^6 -4.019
(2.552)
month_of_year^7 3.259
(2.576)
month_of_year^8 -0.975
(2.551)
month_of_year^9 -1.938
(2.508)
month_of_year^10 -3.669
(2.491)
month_of_year^11 1.678
(2.453)
timeelection × trump -3.588 -6.538
(4.615) (5.165)
timepost-election × trump 1.081 -1.592
(4.270) (4.785)
Num.Obs. 131 131
R2 0.679 0.540
R2 Adj. 0.621 0.506
AIC 941.5 966.6
BIC 1004.7 998.2
Log.Lik. -448.730 -472.284
RMSE 7.44 8.90

If we regress by month in addition to our other variables, then we get what the paper predicts: trump’s name causes a bigger boost post-election than the campaign. However, since the Trump boost (3.374) is more than 3, and the decrease in the campaign is less than that (-2.447), Trump still has a positive association with report searches, even in the campaign. When we stop filtering by month, this remains the case.

Discontinuity on Election

Now, we check if we can find a discontinuity in reporting search rates immediately after the election in the same way that the paper found one after inauguration. We look at the discontinuity at those two locations as well as two arbitrary locations - September 1st before the election and April 1st afterwards - to see if we can find similar discontinuities.

Since the data available to us isn’t available in high-enough precision for this, we use their adjusted daily search data for more precise measurements.

discontinuity_data <- given_daily_data |>
    mutate(arbitrary_before = date > ymd("2016-09-01"), after_election = date > ymd("2016-11-8"), after_inauguration = date > ymd("2017-01-20"), arbitrary_after = date > ymd("2017-04-01")) |>
    filter(date >= ymd("2014-01-01"))

discontinuity_data |>
    pivot_longer(c(arbitrary_before, after_election, after_inauguration, arbitrary_after), names_to = "split_location", values_to = "split_values") |>
    ggplot(aes(x=date, y=search_adj, color=split_values)) +
    geom_point(alpha = 0.05) +
    geom_smooth(method=lm, formula = y ~ x, se=F) +
    facet_wrap(~ split_location)

So, showing a bunch of splits, we can see that while regressing separately between before and after the election showed a large discontinuity, so did regressing before and after the election, which the paper claimed wouldn’t have such a discontinuity. When we use the arbitrary splits, the one afterwards shows almost no discontinuity, but the split beforehand does show one, albeit not as large. Because of this, I don’t think that showing a discontinuity in linear regressions has that precise of an effect and question its use in the paper.

We also wanted to see it as a regression so we could have a numerical measure of the discontinuity.

list(
    lm(search_adj ~ date + arbitrary_before, discontinuity_data),
    lm(search_adj ~ date + after_election, discontinuity_data),
    lm(search_adj ~ date + after_inauguration, discontinuity_data),
    lm(search_adj ~ date + arbitrary_after, discontinuity_data)
) |>
    msummary(output="markdown")
tinytable_x7rrht1v1lz0zf9p2nm4
(1) (2) (3) (4)
(Intercept) 71.010 141.576 155.995 -81.162
(22.202) (22.172) (22.193) (22.955)
date -0.002 -0.006 -0.007 0.007
(0.001) (0.001) (0.001) (0.001)
arbitrary_beforeTRUE 16.113
(1.707)
after_electionTRUE 22.202
(1.692)
after_inaugurationTRUE 23.374
(1.689)
arbitrary_afterTRUE 2.312
(1.749)
Num.Obs. 2186 2186 2186 2186
R2 0.107 0.139 0.146 0.072
R2 Adj. 0.106 0.138 0.145 0.071
AIC 19343.0 19264.6 19246.6 19428.8
BIC 19365.8 19287.4 19269.4 19451.5
Log.Lik. -9667.506 -9628.312 -9619.319 -9710.376
RMSE 20.16 19.80 19.72 20.56

Here, we see that while the largest discontinuity is, in fact, after the inauguration, at 23.374, the election split has almost as big of a modifier with 22.202 - in fact, since it’s within the margin of error, we can’t really say which one is actually better. While the earlier arbitrary time has a smaller discontinuity, at 16.113, it is still large.

The discontinuity does show an increase of searches for reporting immigrants during Trump’s term. But the presence of a discontinuity does not mean that the exact location chose to split the data means anything.